Add BRIN support for spoint and sbox #55

vitcpp · 2023-08-16T09:08:24Z

This is the adaptation of the patch introduced in akorotkov/pgsphere#8 by @gbroccolo with some small fixes. I appreciate any help with reviewing and testing of this patch.

esabol

Some minor tweaks, but otherwise looks good to me.

It would be nice if BRINs supported scircles and the other s-types as well, but maybe in a follow-up PR if there's interest.

esabol · 2023-08-16T09:31:23Z

README.pg_sphere

@@ -11,6 +11,7 @@ It provides:

 This is an R-Tree implementation using GiST for spherical objects like
 spherical points and spherical circles with useful functions and operators.
+It also support the Block Range INdexing (BRIN) for large datasets.


Change "support" to "supports".

esabol · 2023-08-16T09:32:59Z

doc/indices.sgm

+          of data blocks (<literal>pages</literal>) on physical storage in order to
+          organize data searches on ranges of summarized data that can be easily skipped
+          on the base of search filters (see <ulink
+          url="https://www.postgresql.org/docs/9.5/static/brin-intro.html"><citetitle>


Update URL to https://www.postgresql.org/docs/current/brin-intro.html

esabol · 2023-08-16T09:43:39Z

doc/indices.sgm

+            of pages are specified, the higher granularity is reached during
+            the searches, and performance's gap between GiST indexes and BRINs
+            is lower (consider that BRINs size increases as well). Different
+            summarizations can be used with the following command:


This whole paragraph (lines 93-97) could be better phrased. Here is what I recommend:

By default, BRINs summarize blocks of 128 pages. The smaller the number of pages specified, the higher the granularity in searches, and the gap in performance between GiST indexes and BRINs will be decreased. (Note that the size of the BRINs increases as well.) Different summarizations can be specified with the following command:

Yes, it does make sense. I was thinking if it makes sense as well to add a reference to the fact that data in the pages should have some kind of "natural clustering" in the pages themselves: since the Block Range INdex keep the summarised information about the MBR that includes all the geometries in single page, then in case the geometries are widely distributed in the pages it will result for most of them an MBR which is pretty identical to the MBR of the whole dataset, making the index useless. This might happen for example when the geometries are imported in bulk in the DB through parallel importer.

In this case, the solution is to use a GiST index to CLUSTER the table at the end of the bulk import, operation that can take some time for very large DBs but it needs to be done only once.

For future inserts, autovacuum takes care for regular maintenance of the index with unsummarized pages. Further details here: https://www.postgresql.org/docs/current/brin-intro.html

@esabol Not sure, the phrase "(Note that the size of the BRINs increases as well.) " requires the braces. May be, just remove braces? It will be a regular sentense.

i think you mean parentheses, not braces. They can be removed. I only included them because the original text had them.

gbroccolo

I basically agree the rest of the comments from @esabol. I added a single one regarding linking users to further documentation about the maintenance of the index, which was missing in the original PR.

gbroccolo · 2023-08-16T21:05:29Z

doc/indices.sgm

+            of pages are specified, the higher granularity is reached during
+            the searches, and performance's gap between GiST indexes and BRINs
+            is lower (consider that BRINs size increases as well). Different
+            summarizations can be used with the following command:


Yes, it does make sense. I was thinking if it makes sense as well to add a reference to the fact that data in the pages should have some kind of "natural clustering" in the pages themselves: since the Block Range INdex keep the summarised information about the MBR that includes all the geometries in single page, then in case the geometries are widely distributed in the pages it will result for most of them an MBR which is pretty identical to the MBR of the whole dataset, making the index useless. This might happen for example when the geometries are imported in bulk in the DB through parallel importer.

In this case, the solution is to use a GiST index to CLUSTER the table at the end of the bulk import, operation that can take some time for very large DBs but it needs to be done only once.

For future inserts, autovacuum takes care for regular maintenance of the index with unsummarized pages. Further details here: https://www.postgresql.org/docs/current/brin-intro.html

gbroccolo · 2023-08-16T21:52:59Z

Some minor tweaks, but otherwise looks good to me.

It would be nice if BRINs supported scircles and the other s-types as well, but maybe in a follow-up PR if there's interest.

Yes, it is possible to add the support for the other s-types, it's just enough to use the conversion to the 3D-MBR of the spherical geometry defined in src/key.c.

But we should add all the BRIN support functions for each type, so maybe it's better to create single PR for each of it.

Also (and this can be considered an additional note for this PR) it might be worth to create different source files for each data type in order to avoid that src/brin.c could end in thousands of line of code in a single file.

vitcpp · 2023-08-17T06:43:07Z

It would be nice if BRINs supported scircles and the other s-types as well, but maybe in a follow-up PR if there's interest.

I agree, lets do it as a separate task.

vitcpp · 2023-08-17T06:55:43Z

Some minor tweaks, but otherwise looks good to me.
It would be nice if BRINs supported scircles and the other s-types as well, but maybe in a follow-up PR if there's interest.

Yes, it is possible to add the support for the other s-types, it's just enough to use the conversion to the 3D-MBR of the spherical geometry defined in src/key.c.

But we should add all the BRIN support functions for each type, so maybe it's better to create single PR for each of it.

Also (and this can be considered an additional note for this PR) it might be worth to create different source files for each data type in order to avoid that src/brin.c could end in thousands of line of code in a single file.

I like the idea to add support in different PR. Lets compete and deliver this PR. I also like the idea to create different source files for each data type. I would prefer to have a big number of small files with clear names, than to have one file with a lot of code. I propose to keep this PR unchanged and make such work in the future PRs.

vitcpp · 2023-08-17T07:05:58Z

@esabol, @gbroccolo I did some fixes. Thank you! I also rebased the branch. I removed the unnecessary braces in the original sentense: (Note that the size of the BRINs increases as well.). Please, let me know if I did wrong. I will rollback it.

esabol

One more tweak to the documentation, please.

esabol · 2023-08-17T15:33:00Z

doc/indices.sgm

-          time of operators <link
+          and Block Range INdexing (<literal>BRIN</literal>) algorithms to create
+          spherical indices.
+          <literal>GiST</literal> index represents the R-Tree implementation for


Change "index represents the" to "indexes utilize an" on this line.

No problem, I will change it. What do you think if to change 'R-Tree' to 'R-tree'?

What do you think if to change 'R-Tree' to 'R-tree'?

Yes, good idea!

vitcpp · 2023-08-17T17:34:07Z

README.pg_sphere

@@ -11,7 +11,7 @@ It provides:

 This is an R-Tree implementation using GiST for spherical objects like


I propose to change R-Tree to R-tree.

Yes, please.

vitcpp · 2023-08-17T17:46:18Z

doc/indices.sgm

-            the searches, and performance's gap between GiST indexes and BRINs
-            is lower (consider that BRINs size increases as well). Different
-            summarizations can be used with the following command:
+            By default, BRINs summarize blocks of 128 pages. The smaller the


I would also propose to change BRINs to 'BRIN indexes' !?

Yeah, the PostgreSQL documentation uses "BRIN indexes", so I think that would be correct.

esabol

Sorry, I found some more things in the documentation that should be fixed, I think.

esabol · 2023-08-17T20:47:47Z

doc/indices.sgm

+          <literal>GiST</literal> indexes utilize an R-tree implementation for
+          spherical objects, while <literal>BRIN</literal> indexes are based on "summarization"
+          of data blocks (<literal>pages</literal>) on physical storage in order to
+          organize data searches on ranges of summarized data that can be easily skipped


Either change "data blocks" to "data block" or add an apostrophe after "data blocks" to make it possessive. "data block pages" or "data blocks' pages".

@esabol The 'pages' word is written in the brackets. It is how the text is shown in the generated pdf: "summarization" of data blocks (pages) on physical storage...

I think the word pages is used here as the synonym of data blocks. Not sure, we should make this change.

You're right. I didn't see the parentheses. Do not change this.

esabol · 2023-08-17T20:51:23Z

doc/indices.sgm

+          on the base of search filters (see <ulink
+          url="https://www.postgresql.org/docs/current/brin-intro.html"><citetitle>
+          PostgreSQL documentation</citetitle></ulink> for further details on BRIN indexes).
+          As a consequence, BRIN indexes result to be really small indexes (up to 1000 times smaller


Change "result to be really" to "are very".

I'm not sure, I understand the final sentence. I would like to change this sencence to something like this: BRIN indexes consume significantly less physical storage than GiST indexes.

I would like to change this sencence to something like this: BRIN indexes consume significantly less physical storage than GiST indexes.

Fine. 👍

@vitcpp it's not only related to less used storage. It's more because indexes are more efficiently used as far as more 8kB pages of the index are in-memory kept: the more RAM you have available to host the shared buffers in PostgreSQL, the more an index can be used for large amount of data, even if the resulting index is big. The good point of using BRINs is because they fits easily in the shared buffers even in case of large data stored in relatively small machines.

But regarding the correction: it's fine, and it intrinsecally includes what I said above.

May be to simplify the sentense like this:
As a consequence, BRIN indexes are small (up to 1000 times smaller ...)?

I've prepared the commit with such change.

May be to simplify the sentense like this: As a consequence, BRIN indexes are small (up to 1000 times smaller ...)?

I've prepared the commit with such change.

No, that's not proper English. Either go with what I wrote in my original comment or what you wrote in the second comment in this thread.

Oh, sorry. I've changed as you proposed. Thank you.

esabol · 2023-08-17T20:51:42Z

doc/indices.sgm

+          and Block Range INdexing (<literal>BRIN</literal>) algorithms to create
+          spherical indices.
+          <literal>GiST</literal> indexes utilize an R-tree implementation for
+          spherical objects, while <literal>BRIN</literal> indexes are based on "summarization"


Add "the" just before "summarization" and re-word-wrap the whole paragraph if lines are longer than 79 characters.

@esabol I propose not to wrap paragraphs right now because some other unchanged paragraphs do not follow this rule. It will produce more diffs to make the review process more difficult. I see that postgresql doc try to follow this rule but with tag identation is 1 characher. I have some plans to discuss with my colleagues existing best practices and adopted rules in postgresql community to create docs and to reformat pgSphere docs to follow these rules. In this task we can fix long lines.

OK, just add the "the" then.

gbroccolo · 2023-08-20T20:19:35Z

@vitcpp for me the PR is fine and can be merged by any committer of this repo at any moment.

vitcpp · 2023-08-21T10:26:42Z

I propose to add a test for sbox BRIN as well. I've added a test for sbox BRIN index but I do not see that the index was selected by the planner. Sequential scan is still used. I just copied the test for spoint with some trivial changes. I haven't digged deeply how the test works. May be I missed something.

esabol · 2023-08-21T10:34:40Z

I've added a test for sbox BRIN index but I do not see that the index was selected by the planner. Sequential scan is still used. I just copied the test for spoint with some trivial changes. I haven't digged deeply how the test works. May be I missed something.

I think the table needs to be clustered on the parameter on which you have created the BRIN index.

vitcpp · 2023-08-22T08:37:21Z

@esabol I think, BRIN index should be created without clustering but I'm not sure about it. BRIN index is not used in the test brin_sbox. Seq scan is used instead. I still can't achieve it in the test. We may marge PR as it is now but later sbox_brin should be improved.

esabol · 2023-08-22T15:19:32Z

The few examples of BRIN usage I can find on the Internet all cluster the table. 🤷

Apparently, you can SET LOCAL enable_seqscan = OFF; to force the BRIN index to be used?

https://postgrespro.com/list/thread-id/2638982

esabol · 2023-08-22T15:22:41Z

And the StackOverflow answers here all say that "the BRIN index key has to correspond to the physical ordering of the data," which is what clustering achieves.

https://stackoverflow.com/questions/72614493/brin-index-usage

vitcpp · 2023-08-22T18:01:41Z

@esabol Yes, you are right in general. BRIN index follows the physical ordering of the data. But BRIN index is created after the table is populated with some data. If you think to use CLUSTER command, it is not the case. This command requires already created index and it clusters the data in the table in according with the already created index.

As I know, the basic idea of BRIN index is to separate the whole table into a number of ranges. Each range may contain one or more pages (blocks). BRIN index contains some summary of each range: min/max values or bounding rectangle of the geometrical data, stored in the pages of the range. Such separation allows to select ranges for seq scan or ignore it very fast. It is why bitmap scan is used with BRIN index. Such indexing works well then table data are already sorted well because BRIN index operates by pages (blocks).

I guess, there are two possible sources of the problem:

data is too small - optimizer selects seq scan unconditionally
the definition of sbox BRIN index is not complete

esabol · 2023-08-22T22:37:26Z

@esabol Yes, you are right in general. BRIN index follows the physical ordering of the data. But BRIN index is created after the table is populated with some data. If you think to use CLUSTER command, it is not the case. This command requires already created index and it clusters the data in the table in according with the already created index.

Correct, but you can drop the index that you use to cluster the table with after the clustering is done, like this:

CREATE INDEX idx_sometable_someparam ON sometable(someparam);
CLUSTER sometable USING idx_sometable_someparam;
DROP INDEX idx_sometable_someparam;
CREATE INDEX idxbrin_sometable_someparam ON sometable USING brin(someparam);

If sometable is already ordered when loaded into the database, then you don't need to do this, but lots of BRIN examples do exactly this. Some examples also suggest VACUUM-ing and ANALYZE-ing the table as well.

Did you try SET LOCAL enable_seqscan = OFF;?

vitcpp · 2023-08-23T13:49:17Z

@esabol I did what you proposed. I clustered the table first then I created the BRIN index. I also increased the amount of the data in the table. I duplicated the original set of rows many times. The same result - I didn't see that BRIN index was applied.

gbroccolo · 2023-08-23T20:56:56Z

@esabol I did what you proposed. I clustered the table first then I created the BRIN index. I also increased the amount of the data in the table. I duplicated the original set of rows many times. The same result - I didn't see that BRIN index was applied.

Hi @esabol and @vitcpp, sorry for my late reply here, but it's kind of busy period for me in these days.

Regarding your test: you are both right saying that BRINs perform well at the moment that the data is phisically organised in the 8kB pages in order to avoid to have block of pages would correspond to similar MBRs including their entries, as I tried to explain here. Note also that CLUSTER on preexisting indexes is not the only way to achieve this: also VACUUM and autovacuum bring to the same result, because both of them trigger pages' resumming as explained here.

However, this is not going to solve the problem: here BRINs are not used simply because they are not convenient (in terms of I/O and access to memory operations) from PostgreSQL point of view: for small datasets, inspecting the index and select the single blocks of pages to be then sequentially scanned (what is called bitmapscan in the planner) is generally more expensive than a direct sequential scan of all pages of a table (seqscan). It is something that generally happens also for GiST indexes, where for small datasets an indexscan is more expensive than a seqscan.

Now, we have to think about what we want to achieve here: for a regression test is not required to test when the planner choose a plan rather than another (which is something that depends on a lot of factors: underlying hardware, size of the data, nature of the data, configuration of the PostgreSQL engine, ...), but to check that the result obtained through the index would be identical to the one obtained by the sequential scan. This is what I tried to achieve for example in this test I wrote for spoint: I filled a test table with 77 spherical point (more or less close to each other, just to have a good summarization as starting point, but it's not necessary for the purpose of the test) then I was checking if the result with a sequential scan is identical to the one through a bitmap scan.

Hope I helped with this long comment.

vitcpp · 2023-08-24T16:54:23Z

@esabol , @gbroccolo It seems I've found and fixed the problem with sbox support for BRIN index. Some operators in brin_inclusion_spheric_ops were absent (please, see my last commit). Now the BRIN index seems to be chosen by the planner.

esabol · 2023-08-24T17:34:41Z

It seems I've found and fixed the problem with sbox support for BRIN index.

Fantastic! Great work, @vitcpp !

vitcpp · 2023-08-25T08:32:13Z

@esabol , @gbroccolo I've prepared the PR for merge. I squashed my commits into one. I did no more changes except of adding sbox_brin to the list of tests of installcheck rule. I think the PR is ready for merge. If everything is ok, I will merge. Thank you very much for your help!

esabol

Looks great!

vitcpp · 2023-08-25T11:41:02Z

Not sure, what hapenned, but moc100 test is failed now unfortunately.

esabol · 2023-08-25T15:03:01Z

Not sure, what happened, but moc100 test is failed now unfortunately.

Did the PostgreSQL version used in the PG16 build change? 16 beta 3 was released on August 10. When was the last time the PG16 build passed? Maybe it took 2 weeks before 16 beta 3 showed up in Travis CI? If the only reason for the failure is going from 16 beta 2 to beta 3, then this change in the output might need to be reported to the PostgreSQL dev team.

My first thought was that it was due to a side effect of the SQL in the BRIN test, but that can't be since PR #56 also failed its PG16 build.

If this is an intentional change by the PostgreSQL developers in PostgreSQL 16, then we're going to need different expected outputs for pre-16 and 16+. (How annoying!)

vitcpp · 2023-08-25T15:57:09Z

@esabol I did some changes in the PR 2 days ago and PG16 tests were passed ok. My last change is to squash my commits. I guess, something is changed on travis-ci. Furthermore, my last PR #56 is failed as well. I will try to fix the differences in the output - we do not need to validate the behaviour of postgresql here. If we fix the current behaviour we can see some changes in the future.

vitcpp · 2023-08-25T18:45:55Z

@esabol I adjusted expected/moc100_4.out file which fixed the issue. I'm going to do this change in a separate PR. I propose to fix moc100 first, then rebase and merge this PR.

Adjust style using pgindent, execute regression test for BRIN code just for PG>9.5

Move brin.c(h) files into src subdirectory Add brin support in upgrade script Fix some code problems Fix test_init Fix Indices section in the doc Add test for sbox BRIN index Fix found problems in the doc after review Fix BRIN support for sbox

gbroccolo

LGTM now. Thanks @vitcpp.

vitcpp changed the title ~~Brin support~~ Add BRIN support for spoint and sbox Aug 16, 2023

vitcpp changed the title ~~Add BRIN support for spoint and sbox~~ Add BRIN support for spoint and sbox (#52) Aug 16, 2023

vitcpp changed the title ~~Add BRIN support for spoint and sbox (#52)~~ Add BRIN support for spoint and sbox (issue #52) Aug 16, 2023

vitcpp changed the title ~~Add BRIN support for spoint and sbox (issue #52)~~ Add BRIN support for spoint and sbox Aug 16, 2023

vitcpp linked an issue Aug 16, 2023 that may be closed by this pull request

Add BRIN support for spoint and sbox? #52

Closed

esabol suggested changes Aug 16, 2023

View reviewed changes

esabol mentioned this pull request Aug 16, 2023

Add BRIN support for spoint and sbox akorotkov/pgsphere#8

Closed

gbroccolo reviewed Aug 16, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch from 72a4e65 to ea052da Compare August 17, 2023 07:02

esabol suggested changes Aug 17, 2023

View reviewed changes

vitcpp commented Aug 17, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch from ea052da to 9abdba5 Compare August 17, 2023 17:44

vitcpp commented Aug 17, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch from 9abdba5 to 4cb5ed3 Compare August 17, 2023 20:39

esabol suggested changes Aug 17, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch from 4cb5ed3 to cc67320 Compare August 21, 2023 09:34

vitcpp force-pushed the brin-support branch from 048ecad to 1d28b7a Compare August 21, 2023 20:26

esabol approved these changes Aug 21, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch from bbdd3c3 to 6d9a76f Compare August 23, 2023 13:47

vitcpp force-pushed the brin-support branch from 6d9a76f to 50296ee Compare August 23, 2023 13:53

esabol approved these changes Aug 24, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch 3 times, most recently from 999173a to a6f9cba Compare August 25, 2023 08:27

esabol approved these changes Aug 25, 2023

View reviewed changes

vitcpp force-pushed the brin-support branch from a6f9cba to fda873b Compare August 25, 2023 10:59

vitcpp force-pushed the brin-support branch from fda873b to 249c232 Compare August 25, 2023 16:08

Giuseppe Broccolo and others added 2 commits August 26, 2023 09:21

Add BRIN support for spoint and sbox

6ebf5c2

Adjust style using pgindent, execute regression test for BRIN code just for PG>9.5

vitcpp force-pushed the brin-support branch from 249c232 to fa429a5 Compare August 26, 2023 06:24

esabol approved these changes Aug 26, 2023

View reviewed changes

gbroccolo approved these changes Aug 26, 2023

View reviewed changes

vitcpp merged commit c79fccc into postgrespro:master Aug 27, 2023

vitcpp deleted the brin-support branch August 27, 2023 08:32

		@@ -11,7 +11,7 @@ It provides:

		This is an R-Tree implementation using GiST for spherical objects like

Add BRIN support for spoint and sbox #55

Add BRIN support for spoint and sbox #55

Uh oh!

Conversation

vitcpp commented Aug 16, 2023

Uh oh!

esabol left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esabol Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vitcpp Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gbroccolo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gbroccolo commented Aug 16, 2023

Uh oh!

vitcpp commented Aug 17, 2023

Uh oh!

vitcpp commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vitcpp commented Aug 17, 2023

Uh oh!

esabol left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esabol left a comment

Choose a reason for hiding this comment

Uh oh!

esabol Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esabol Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

esabol Aug 16, 2023 •

edited

Loading

vitcpp Aug 17, 2023 •

edited

Loading

vitcpp commented Aug 17, 2023 •

edited

Loading

esabol Aug 17, 2023 •

edited

Loading

esabol Aug 18, 2023 •

edited

Loading